feat(sdk): Add comments to IR YAML file #8467

JOCSTAA · 2022-11-17T00:42:05Z

Description of your changes:
Implements a feature for the KFP SDK DSL compiler to include a summary of the pipeline details( including name, description and input/output signatures) as a comment to the IR YAML file

Checklist:

The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

JOCSTAA · 2022-11-17T18:24:12Z

/retest

connor-mccarthy

Thanks, @JOCSTAA!

One high level consideration: we'll want to also parse the description when we read load a component/pipeline from YAML, since this information is not preserved in PipelineSpec. This is so that the description isn't lost when a user authors a pipeline, compiles it, loads it, then compiles it again. We can discuss this offline.

connor-mccarthy · 2022-11-17T18:53:55Z

sdk/python/kfp/compiler/compiler.py

+                description = pipeline_func.description or None
+
+            else:
+                description = None


It's possible I'm missing something, but it looks like this is to handle the fact that GraphComponents have a .description, but other BaseComponents (YamlComponent, PythonComponent) don't. Do you think we could put .description on the BaseComponent abstract base class and implement for all concrete classes, that way all component/pipeline types can be treated the same way by the compiler?

This would also allow us to avoid expanding the interface of write_pipeline_spec_to_file to support comments.

sdk/python/kfp/compiler/pipeline_spec_builder.py

connor-mccarthy · 2022-11-17T19:00:32Z

sdk/python/kfp/compiler/pipeline_spec_builder.py

+    if pipeline_description:
+        comment += '# Description: ' + pipeline_description + '\n'
+    comment += add_inputs()
+    comment += add_outputs()


nit: Consider obtaining all the different sections in a list, then using a string joining method. I think this would slightly improve readability and reduce the likelihood that a \n is omitted on one of the strings somewhere.

'\n'.join(sections)

The same pattern might be helpful in add_inputs and add_outputs

connor-mccarthy · 2022-11-17T19:20:31Z

sdk/python/kfp/compiler/pipeline_spec_builder.py

+    }
+
+    def add_inputs():
+        if 'inputDefinitions' in pipeline_spec['root']:


nit: What do you think about separating the information collection from the information presentation in add_inputs and add_outputs for readability [ref]?

sdk/python/kfp/compiler/pipeline_spec_builder.py

sdk/python/kfp/compiler/compiler_test.py

connor-mccarthy · 2022-11-17T19:36:42Z

sdk/python/kfp/compiler/compiler_test.py

@@ -1490,5 +1490,161 @@ def pipeline_with_input(boolean: bool = False):
            .default_value.bool_value, True)


+class TestYamlComments(unittest.TestCase):


These tests look good! A couple more test cases come to mind:

compiling a Python component @dsl.component

compiling a container component @dsl.container_component

compiling a pipeline or component with inputs that don't have default values

compiling a pipeline with an explicit name argument

compiling a pipeline or component with a float, int, dict, and list

compiling a pipeline or component with an artifact input and output

I don't think these each necessarily must be their own test (though that's fine too -- it's more specific but also more verbose).

Bundling these characteristics together and doing full comment assertions (comparing the actual full comment against an expected full comment) could make tackling these many cases a bit more manageable and similar to real-world components we'll be handling.

JOCSTAA · 2022-11-17T20:37:49Z

/retest

JOCSTAA · 2022-11-21T23:09:39Z

/retest

JOCSTAA · 2022-11-22T00:42:41Z

/retest

sdk/python/kfp/components/structures.py

connor-mccarthy · 2022-11-22T14:40:20Z

sdk/python/kfp/components/structures.py

@@ -816,14 +816,28 @@ def load_from_component_yaml(cls, component_yaml: str) -> 'ComponentSpec':
            Component spec in the form of V2 ComponentSpec.
        """

+        def extract_description(component_yaml: str) -> str:
+            heading = '# Description: '


Is this redundant with extract_description_from_command [ref]? If so, can we remove one?

I think extracting from the comment (this function) is a better approach since it works for both components and pipelines.

Good suggestion but I tested and confirmed that there are some cases where the description would not be in the pipeline spec yaml file but would be in the comments (for example the user defines the description using my_comp.description = 'description' and not as a functions docstring ) and vice versa (pipeline that was compiled previously without a comment) so i think both are necessary

Ah, I see. Thanks for explaining. The support for existing IR YAML is a great point. Am I correct that this only applies for compiled components? I believe any pipeline function docstrings would be lost during compilation.

I'm not sure I've ever seen description set via my_comp.description = 'description'. Do you have any example of this?

connor-mccarthy · 2022-11-22T14:42:55Z

sdk/python/kfp/compiler/pipeline_spec_builder.py

+
+        return comment_strings
+
+    if 'root' not in pipeline_spec:


Are there any instances where this condition would be true?

I came across a test case when running the python execution tests which had no 'root'. [ref]

I think the link is broken. Can you provide another?

I looked into this. It's attributed to these tests: sdk/python/kfp/compiler/pipeline_spec_builder_test.py::TestWriteIrToFile, which attempt to write an empty PipelineSpec.

Can you update these these? It's preferable to not program around our tests in the library code.

sdk/python/kfp/compiler/pipeline_spec_builder.py

sdk/python/kfp/components/graph_component.py

connor-mccarthy · 2022-11-22T15:05:58Z

sdk/python/kfp/components/structures.py

+        def extract_description(component_yaml: str) -> str:
+            heading = '# Description: '
+            if heading in component_yaml:
+                description = component_yaml.splitlines()[2]


We'll also need to be able to handle multiline docstrings

connor-mccarthy · 2022-11-22T15:10:01Z

sdk/python/kfp/compiler/compiler_test.py

+        name_string = '# Name: my-pipeline'
+
+        # test name is in comments
+        self.assertTrue(name_string in yaml_content)


nit: What you have is perfectly valid, but perhaps use the approach of comparing the full comment string throughout (what you've done in the last test method). It's easier to reason about the correctness of the tests that compare the full comment string.

I do feel like having it this way for these first few tests gives more specificity as to what exactly we are searching for in the comments, but if you do have a strong opinion against it, i could always implement it that way

Not a strong opinion. We can leave it as is if you prefer.

sdk/python/kfp/compiler/compiler_test.py

JOCSTAA · 2022-11-23T20:03:16Z

/retest

connor-mccarthy · 2022-11-23T21:54:30Z

@JOCSTAA, ignore kubeflow-pipelines-tfx-python37 test failure for now. I believe this is unrelated to your changes.

chensun · 2022-11-23T23:19:57Z

@JOCSTAA, ignore kubeflow-pipelines-tfx-python37 test failure for now. I believe this is unrelated to your changes.

Once GoogleCloudPlatform/oss-test-infra#1843 is merged, tfx test should no longer be required on master branch.

JOCSTAA · 2022-11-28T20:34:59Z

retest/

JOCSTAA · 2022-11-28T20:39:34Z

retest/

JOCSTAA · 2022-11-29T22:39:50Z

retest/

JOCSTAA · 2022-11-29T22:40:21Z

/test kubeflow-pipelines-sdk-python39

connor-mccarthy

Thank you again for your work on this, @JOCSTAA. The comments are mostly nitpicks related to maintainability.

sdk/python/kfp/compiler/pipeline_spec_builder.py

sdk/python/kfp/components/component_factory.py

sdk/python/kfp/compiler/pipeline_spec_builder.py

sdk/python/kfp/components/structures.py

connor-mccarthy · 2022-11-30T06:14:08Z

sdk/python/kfp/compiler/compiler_test.py

+                #    sample_input3: system.Model
+                #    sample_input4: float [Default: 3.14]
+                #    sample_input5: list [Default: [1.0, 2.0, 3.0]]
+                #    sample_input6: dict [Default: {'one': 1.0, 'three': 3.0, 'two': 2.0}]


nit: String quotes are " above for a default of type string but ' here in a dict

connor-mccarthy · 2022-11-30T06:17:30Z

sdk/python/kfp/compiler/pipeline_spec_builder.py

+    comment_sections.append('# PIPELINE DEFINITION')
+    comment_sections.append('# Name: ' + pipeline_spec['pipelineInfo']['name'])
+    if pipeline_description:
+        pipeline_description = '\n#              '.join(


It looks like the length of this string is based on the size of the '# Description' string. Can you implement that programmatically to make the coupling explicit?

connor-mccarthy · 2022-11-30T06:18:12Z

sdk/python/kfp/compiler/compiler_test.py

+        # test comments work on compiled container components
+        self.assertIn(predicted_comment, yaml_content)
+
+    def test_comments_indempotency(self):


nit: indempotency -> idempotency

connor-mccarthy · 2022-11-30T06:19:46Z

sdk/python/kfp/compiler/compiler_test.py

+            pipeline_spec_path = os.path.join(tmpdir, 'output.yaml')
+            compiler.Compiler().compile(
+                pipeline_func=my_pipeline, package_path=pipeline_spec_path)
+            comp = components.load_component_from_file(pipeline_spec_path)


nit: You can move this up to the original temporary directory to avoid having to compile again.

sdk/python/kfp/compiler/compiler_test.py

JOCSTAA · 2022-12-01T22:29:52Z

/retest

connor-mccarthy · 2022-12-02T22:37:06Z

sdk/python/kfp/compiler/pipeline_spec_builder.py

+
+        return comment_strings
+
+    if 'root' not in pipeline_spec:


I looked into this. It's attributed to these tests: sdk/python/kfp/compiler/pipeline_spec_builder_test.py::TestWriteIrToFile, which attempt to write an empty PipelineSpec.

Can you update these these? It's preferable to not program around our tests in the library code.

JOCSTAA · 2022-12-05T18:24:25Z

/test kubeflow-pipeline-e2e-test

JOCSTAA · 2022-12-05T19:26:42Z

/retest

connor-mccarthy · 2022-12-05T20:11:20Z

We can ignore kubeflow-pipelines-tfx-python37. It's not supposed to run on merges into master anymore: https://github.com/GoogleCloudPlatform/oss-test-infra/blob/8e2972aa3e72cdd221f03447b10f0d2358e88f9d/prow/prowjobs/kubeflow/pipelines/kubeflow-pipelines-presubmits.yaml#L130. Prow is trying to get it to pass since it once failed on this PR. It will not gate merge.

connor-mccarthy

Thank you for all your hard work on this, @JOCSTAA! This is a really great feature.

/lgtm
/approve

google-oss-prow · 2022-12-05T22:28:52Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: connor-mccarthy
Once this PR has been reviewed and has the lgtm label, please assign ironpan for approval by writing /assign @ironpan in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow · 2022-12-05T22:39:27Z

@JOCSTAA: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
kubeflow-pipelines-tfx-python37	`16e1740`	link	true	`/test kubeflow-pipelines-tfx-python37`
kubeflow-pipeline-e2e-test	`f4a7fa5`	link	true	`/test kubeflow-pipeline-e2e-test`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

connor-mccarthy · 2022-12-05T22:43:10Z

kubeflow-pipeline-e2e-test are unrelated to these changes. A fix is in progress.

* base * add tests * fix bug * adress comments * address comments 2 * sort comments * sort signatures * add indempotent test * add indempotent test2 * support multiline docstring * review * docformatter presubmit exclude * docformatter presubmit exclude * docformatter * docformatter * merge 1 * update readme * nit .items() * remove reduntant test

JOCSTAA added 2 commits November 15, 2022 14:41

base

c0e10fa

add tests

a214b0c

google-oss-prow bot requested review from connor-mccarthy and zijianjoy November 17, 2022 00:42

google-oss-prow bot added the size/L label Nov 17, 2022

fix bug

ce06c9d

connor-mccarthy requested changes Nov 17, 2022

View reviewed changes

jlyaoyuli assigned connor-mccarthy Nov 17, 2022

adress comments

cf13883

JOCSTAA requested a review from connor-mccarthy November 21, 2022 23:21

connor-mccarthy requested changes Nov 22, 2022

View reviewed changes

connor-mccarthy reviewed Nov 22, 2022

View reviewed changes

sdk/python/kfp/compiler/compiler_test.py Outdated Show resolved Hide resolved

address comments 2

c35c194

JOCSTAA requested a review from connor-mccarthy November 22, 2022 19:52

JOCSTAA added 2 commits November 22, 2022 14:59

sort comments

a5853e3

sort signatures

03849d6

JOCSTAA added 2 commits November 23, 2022 14:26

add indempotent test

aeb8f8b

add indempotent test2

16e1740

support multiline docstring

9493b0a

google-oss-prow bot added size/XL and removed size/L labels Nov 28, 2022

connor-mccarthy requested changes Nov 30, 2022

View reviewed changes

JOCSTAA added 5 commits November 30, 2022 16:05

review

67d54e6

docformatter presubmit exclude

bfc7b46

docformatter presubmit exclude

6914577

docformatter

30ca08e

docformatter

331b261

JOCSTAA added 2 commits December 2, 2022 14:05

merge 1

f940d4b

merged

d289edd

JOCSTAA force-pushed the yaml_comments branch from 2786638 to d289edd Compare December 2, 2022 22:16

update readme

5b16589

connor-mccarthy requested changes Dec 2, 2022

View reviewed changes

JOCSTAA requested a review from connor-mccarthy December 2, 2022 23:18

nit .items()

8fbebfa

remove reduntant test

f4a7fa5

connor-mccarthy approved these changes Dec 5, 2022

View reviewed changes

google-oss-prow bot added the lgtm label Dec 5, 2022

connor-mccarthy merged commit 49db63c into kubeflow:master Dec 5, 2022

connor-mccarthy mentioned this pull request Dec 14, 2022

chore(sdk): update golden snapshots with pipeline interface comments #8575

Merged

1 task

		@@ -1490,5 +1490,161 @@ def pipeline_with_input(boolean: bool = False):
		.default_value.bool_value, True)


		class TestYamlComments(unittest.TestCase):

feat(sdk): Add comments to IR YAML file #8467

feat(sdk): Add comments to IR YAML file #8467

Conversation

JOCSTAA commented Nov 17, 2022 • edited Loading

JOCSTAA commented Nov 17, 2022

connor-mccarthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JOCSTAA commented Nov 17, 2022

JOCSTAA commented Nov 21, 2022

JOCSTAA commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JOCSTAA commented Nov 23, 2022

connor-mccarthy commented Nov 23, 2022

chensun commented Nov 23, 2022

JOCSTAA commented Nov 28, 2022

JOCSTAA commented Nov 28, 2022

JOCSTAA commented Nov 29, 2022

JOCSTAA commented Nov 29, 2022

connor-mccarthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JOCSTAA commented Dec 1, 2022

Choose a reason for hiding this comment

JOCSTAA commented Dec 5, 2022

JOCSTAA commented Dec 5, 2022

connor-mccarthy commented Dec 5, 2022

connor-mccarthy left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Dec 5, 2022

google-oss-prow bot commented Dec 5, 2022

connor-mccarthy commented Dec 5, 2022

JOCSTAA commented Nov 17, 2022 •

edited

Loading